Job Scheduling for Multi-User MapReduce Clusters

نویسندگان

Matei Zaharia

Dhruba Borthakur

Joydeep Sen Sarma

Khaled Elmeleegy

Scott Shenker

Ion Stoica

چکیده

Sharing a MapReduce cluster between users is attractive because it enables statistical multiplexing (lowering costs) and allows users to share a common large data set. However, we find that traditional scheduling algorithms can perform very poorly in MapReduce due to two aspects of the MapReduce setting: the need for data locality (running computation where the data is) and the dependence between map and reduce tasks. We illustrate these problems through our experience designing a fair scheduler for MapReduce at Facebook, which runs a 600-node multiuser data warehouse on Hadoop. We developed two simple techniques, delay scheduling and copy-compute splitting, which improve throughput and response times by factors of 2 to 10. Although we focus on multi-user workloads, our techniques can also raise throughput in a single-user, FIFO workload by a factor of 2.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Resource-Aware Adaptive Scheduling for MapReduce Clusters

We present a resource-aware scheduling technique for MapReduce multi-job workloads that aims at improving resource utilization across machines while observing completion time goals. Existing MapReduce schedulers define a static number of slots to represent the capacity of a cluster, creating a fixed number of execution slots per machine. This abstraction works for homogeneous workloads, but fai...

متن کامل

ShuffleWatcher: Shuffle-aware Scheduling in Multi-tenant MapReduce Clusters

MapReduce clusters are usually multi-tenant (i.e., shared among multiple users and jobs) for improving cost and utilization. The performance of jobs in a multitenant MapReduce cluster is greatly impacted by the allMap-to-all-Reduce communication, or Shuffle, which saturates the cluster’s hard-to-scale network bisection bandwidth. Previous schedulers optimize Map input locality but do not consid...

متن کامل

A Throughput Driven Task Scheduler for Batch Jobs in Shared MapReduce Environments

MapReduce is one of the most popular parallel data processing systems, and it has been widely used in many fields. As one of the most important techniques in MapReduce, task scheduling strategy is directly related to the system performance. However, in multi-user shared MapReduce environments, the existing task scheduling algorithms cannot provide high system throughput when processing batch jo...

متن کامل

Queuing Network Models to Predict the Completion Time of the Map Phase of MapReduce Jobs

Big Data processing is generally defined as a situation when the size of the data itself becomes part of the computational problem. This has made divide-and-conquer type algorithms implemented in clusters of multi-core CPUs in Hadoop/MapReduce environments an important data processing tool for many organizations. Jobs of various kinds, which consists of a number of automatically parallelized ta...

متن کامل

A Relative Study on Task Schedulers in Hadoop MapReduce

Hadoop is a framework for BigData processing in distributed applications. Hadoop cluster is built for running data intensive distributed applications. Hadoop distributed file system is the primary storage area for BigData. MapReduce is a model to aggregate tasks of a job. Task assignment is possible by schedulers. Schedulers guarantee the fair allocation of resources among users. When a user su...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Job Scheduling for Multi-User MapReduce Clusters

نویسندگان

چکیده

منابع مشابه

Resource-Aware Adaptive Scheduling for MapReduce Clusters

ShuffleWatcher: Shuffle-aware Scheduling in Multi-tenant MapReduce Clusters

A Throughput Driven Task Scheduler for Batch Jobs in Shared MapReduce Environments

Queuing Network Models to Predict the Completion Time of the Map Phase of MapReduce Jobs

A Relative Study on Task Schedulers in Hadoop MapReduce

عنوان ژورنال:

اشتراک گذاری